Search CORE

543 research outputs found

Incremental Algorithms for Effective and Efficient Query Recommendation

Author: A. Cayci
D. Beeferman
P. Boldi
R. Baeza-Yates
R. Baeza-Yates
S. Muthukrishnan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract. Query recommender systems give users hints on possible in-teresting queries relative to their information needs. Most query rec-ommenders are based on static knowledge models built on the basis of past user behaviors recorded in query logs. These models should be pe-riodically updated, or rebuilt from scratch, to keep up with the possible variations in the interests of users. We study query recommender algo-rithms that generate suggestions on the basis of models that are updated continuously, each time a new query is submitted. We extend two state-of-the-art query recommendation algorithms and evaluate the effects of continuous model updates on their effectiveness and efficiency. Tests con-ducted on an actual query log show that contrasting model aging by con-tinuously updating the recommendation model is a viable and effective solution.

CiteSeerX

Crossref

Ranking and clustering of nodes in networks with smart teleportation

Author: A. Langville
B. Gonçalves
L. Adamic
L. Pretto
M. Rosvall
P. Boldi
R. Baeza-Yates
R. Lambiotte
S. Fortunato
Publication venue: 'American Physical Society (APS)'
Publication date: 08/05/2012
Field of study

Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering.Comment: 10 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Repository of the University of Namur

TimeMachine: Timeline Generation for Knowledge-Base Entities

Author: Baeza-Yates R. A.
Dasgupta A.
Do Q. X.
Graus D.
Krause A.
Lin C.-Y.
Lin H.
Ling X.
Minoux M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/06/2015
Field of study

We present a method called TIMEMACHINE to generate a timeline of events and relations for entities in a knowledge base. For example for an actor, such a timeline should show the most important professional and personal milestones and relationships such as works, awards, collaborations, and family relationships. We develop three orthogonal timeline quality criteria that an ideal timeline should satisfy: (1) it shows events that are relevant to the entity; (2) it shows events that are temporally diverse, so they distribute along the time axis, avoiding visual crowding and allowing for easy user interaction, such as zooming in and out; and (3) it shows events that are content diverse, so they contain many different types of events (e.g., for an actor, it should show movies and marriages and awards, not just movies). We present an algorithm to generate such timelines for a given time period and screen size, based on submodular optimization and web-co-occurrence statistics with provable performance guarantees. A series of user studies using Mechanical Turk shows that all three quality criteria are crucial to produce quality timelines and that our algorithm significantly outperforms various baseline and state-of-the-art methods.Comment: To appear at ACM SIGKDD KDD'15. 12pp, 7 fig. With appendix. Demo and other info available at http://cs.stanford.edu/~althoff/timemachine

arXiv.org e-Print Archive

Crossref

Fast Searching in Packed Strings

Author: A. Amir
D.E. Knuth
E.W. Myers
G. Navarro
J. Tarhio
K. Fredriksson
K. Fredriksson
R. Baeza-Yates
R.A. Baeza-Yates
R.M. Karp
R.S. Boyer
S. Wu
S.T. Klein
T.A. Welch
V.L. Arlazarov
W. Masek
W. Rytter
Publication venue
Publication date: 01/01/2009
Field of study

Given strings

P

and

Q

the (exact) string matching problem is to find all positions of substrings in

Q

matching

P

. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let

m \leq n

be the lengths

P

and

Q

, respectively, and let

\sigma

denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of

P

Q

. For

m = o(n)

this improves the

O(n)

bound of the Knuth-Morris-Pratt algorithm. Furthermore, if

m = O(n/\log_\sigma n)

our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

Revisiting the Problem of Searching on a Line

Author: A. Collins
A. López-Ortiz
C.A. Hipke
E. Koutsoupias
G. Marco De
J. Czyzowicz
M. Hammar
R. Bellman
R.A. Baeza-Yates
S. Alpern
V. Hoggatt
Publication venue
Publication date: 01/01/2013
Field of study

We revisit the problem of searching for a target at an unknown location on a line when given upper and lower bounds on the distance D that separates the initial position of the searcher from the target. Prior to this work, only asymptotic bounds were known for the optimal competitive ratio achievable by any search strategy in the worst case. We present the first tight bounds on the exact optimal competitive ratio achievable, parameterized in terms of the given bounds on D, along with an optimal search strategy that achieves this competitive ratio. We prove that this optimal strategy is unique. We characterize the conditions under which an optimal strategy can be computed exactly and, when it cannot, we explain how numerical methods can be used efficiently. In addition, we answer several related open questions, including the maximal reach problem, and we discuss how to generalize these results to m rays, for any m >= 2

arXiv.org e-Print Archive

Crossref

Carleton University's Institutional Repository

Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles

Author: Al Zamal F.
Baeza-Yates R.
Brzozowski M. J.
Festinger L.
Goel A.
Graells-Garrido E.
Myers D. G.
Pariser E.
Ramage D.
Sunstein C. R.
Valenzuela S.
Řehůřek R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/01/2016
Field of study

In micro-blogging platforms, people connect and interact with others. However, due to cognitive biases, they tend to interact with like-minded people and read agreeable information only. Many efforts to make people connect with those who think differently have not worked well. In this paper, we hypothesize, first, that previous approaches have not worked because they have been direct -- they have tried to explicitly connect people with those having opposing views on sensitive issues. Second, that neither recommendation or presentation of information by themselves are enough to encourage behavioral change. We propose a platform that mixes a recommender algorithm and a visualization-based user interface to explore recommendations. It recommends politically diverse profiles in terms of distance of latent topics, and displays those recommendations in a visual representation of each user's personal content. We performed an "in the wild" evaluation of this platform, and found that people explored more recommendations when using a biased algorithm instead of ours. In line with our hypothesis, we also found that the mixture of our recommender algorithm and our user interface, allowed politically interested users to exhibit an unbiased exploration of the recommended profiles. Finally, our results contribute insights in two aspects: first, which individual differences are important when designing platforms aimed at behavioral change; and second, which algorithms and user interfaces should be mixed to help users avoid cognitive mechanisms that lead to biased behavior.Comment: 12 pages, 7 figures. To be presented at ACM Intelligent User Interfaces 201

arXiv.org e-Print Archive

Crossref

Predicting your next OLAP query based on recent analytical sessions

Author: A. Giacometti
C. Sapia
G. Adomavicius
G. Chatzopoulou
N. Khoussainova
N. Khoussainova
R. Baeza-Yates
S. Sarawagi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceIn Business Intelligence systems, users interact with data warehouses by formulating OLAP queries aimed at exploring multidimensional data cubes. Being able to predict the most likely next queries would provide a way to recommend interesting queries to users on the one hand, and could improve the efficiency of OLAP sessions on the other. In particular, query recommendation would proactively guide users in data exploration and improve the quality of their interactive experience. In this paper, we propose a framework to predict the most likely next query and recommend this to the user. Our framework relies on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric. To gain insight in the recommendation precision and on what parameters it depends, we evaluate our approach using different quality assessments

HAL-CentraleSupelec

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

HAL Université de Tours

Exploiting the Social Capital of Folksonomies for Web Page Classification

Author: A. Hotho
F. Echarte
J. Platt
M. Porter
R. Baeza-Yates
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Dynamic Set Intersection

Author: A Björklund
A Brodnik
A Itai
G Myers
H Cohen
I Baran
ML Fredman
N Chiba
P Bille
R Baeza-Yates
S Albers
TM Chan
TM Chan
TM Chan
TM Chan
WJ Masek
Publication venue
Publication date: 04/05/2015
Field of study

Consider the problem of maintaining a family

F

of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given

S,S'\in F

, report every member of

S\cap S'

in any order. We show that in the word RAM model, where

w

is the word size, given a cap

d

on the maximum size of any set, we can support set intersection queries in

O(\frac{d}{w/\log^2 w})

expected time, and updates in

O(\log w)

expected time. Using this algorithm we can list all

t

triangles of a graph

G=(V,E)

O(m+\frac{m\alpha}{w/\log^2 w} +t)

expected time, where

m=|E|

and

\alpha

is the arboricity of

G

. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in

O(m \alpha)

time. We provide an incremental data structure on

F

that supports intersection {\em witness} queries, where we only need to find {\em one}

e\in S\cap S'

. Both queries and insertions take O\paren{\sqrt \frac{N}{w/\log^2 w}} expected time, where

N=\sum_{S\in F} |S|

. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using

M

words of space, each update costs

O(\sqrt {M \log N})

expected time, each reporting query costs

O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1})

expected time where

op

is the size of the output, and each witness query costs

O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N)

expected time.Comment: Accepted to WADS 201

arXiv.org e-Print Archive

Crossref

Expected length of the longest common subsequence for large alphabets

Author: A. Frieze
A. Vershik
B. Bollobás
B. Logan
C. Schensted
D. Aldous
J. Baik
J. Gravner
J.F.C. Kingman
K. Johannson
M. Kiwi
P. Erdös
P. Pevzner
R. Baeza-Yates
R. Stanley
S. Janson
S. Ulam
V. Chvátal
Publication venue
Publication date: 01/01/2003
Field of study

We consider the length L of the longest common subsequence of two randomly uniformly and independently chosen n character words over a k-ary alphabet. Subadditivity arguments yield that the expected value of L, when normalized by n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe

arXiv.org e-Print Archive

CiteSeerX

Crossref